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We provide a new quantum algoritlim tiiat efficiently determines tiie quality of a least-squares 
fit over an exponentially large data set by building upon an algorithm for solving systems of linear 
equations efficiently (Harrow et al., Phys. Rev. Lett. 103, f50502 (2009)). In many cases, our 
. . . algorithm can also efficiently find a concise function that approximates the data to be fitted and 

04 ' bound the approximation error. In cases where the input data is a pure quantum state, the algorithm 

can be used to provide an efficient parametric estimation of the quantum state and therefore can be 
, applied as an alternative to full quantum state tomography given a fault tolerant quantum computer. 
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PACS numbers: 03.67.-a, 03.67.Ac, 42.50.Dv 



^r^i • Invented as early as 1794 by Carl Friedrich Gauss, fitting data to theoretical models has become over the centuries 
one of the most important tools in all of quantitative science [l[ • Typically, a theoretical model depends on a number 
of parameters, and leads to functional relations between data that will depend on those parameters. Fitting a large 
amount of experimental data to the functional relations allows one to obtain reliable estimates of the parameters. If 
r^i: the amount of data becomes very large, fitting can become very costly. Examples include inversion problems of X-ray 
' or neutron scattering data for structure analysis, or high-energy physics with giga-bytes of data produced per second 
\ at the LHC. Typically, structure analysis starts from a first guess of the structure, and then iteratively tries to improve 
^ ' the fit to the experimental data by testing variations of the structure. It is therefore often desirable to test many 
O^] different models, and compare the best possible fits they provide before committing to one for which one extracts 
then the parameters from the fit. Obtaining a good fit with a relatively small number of parameters compared to the 
amount of data can be considered a form of data compression. Indeed, also for numerically calculated data, such as 
^ , many-body wave-functions in molecular engineering, efficient fitting of the wave-functions to simpler models would 
CN ' be highly desirable. 

\l \ With the rise of quantum information theory, one might wonder if a quantum algorithm can be found that solves 
' these problems efficiently. The discovery that exploiting quantum mechanical effects might lead to enhanced computa- 
. ' tional power compared to classical information processing has triggered large-scale research aimed at finding quantum 
1 algorithms which are more efficient than the best classical counterparts 0-01 ■ Although fault-tolerant quantum com- 
' putation remains out of reach at present, quantum simulation is already now on the verge of providing answers to 
^ ^ : questions concerning the states of complex systems that are beyond classical computability [8,j iS] ■ Recently, a quan- 
. . ' tum algorithm (called HHL in the following) was introduced that efficiently solves a linear equation. Fx — b, with 
given vector b of dimension N and sparse Hermitian matrix F "Efficient solution" means that the expectation 
value (x|M|x) of an arbitrary poly-size Hermitian operator M can be found in roughly 0(5"*^^ log(iV)/e) steps [T]| . 
where k is the condition number of F, i.e. the ratio between the largest and smallest eigenvalue of F, s denotes the 
d ■ sparsenes (i.e. the maximum number of non-zero matrix elements of F in any given row or column), and e is the 
maximum allowed distance between the \x) found by the computer and the exact solution. In contrast, they show 
that it is unlikely that classical computers can efficiently solve similar problems because it would imply that quantum 
computers are no more powerful than classical computers. 

While it has remained unclear so far whether expectation values of the form (x|M|x) provide answers to computa- 
tionally important questions, we provide here an adaption of the algorithm to the problem of data fitting that allows 
one to efficiently obtain the quality of a fit without having to learn the fit-parameters. Our algorithm is particularly 
useful for fitting data efficiently computed by a quantum computer or quantum simulator, especially if an evolution 
can be efficiently simulated but no known method exists to efficiently learn the resultant state. For example, our 
algorithm could be used to efficiently find a concise matrix-product state approximation to a groundstate yielded by 
a quantum many-body simulator and assess the approximation error. More complicated states can be used in the 
fit if the quantum computer can efficiently prepare them. Fitting quantum states to a set of known functions is an 
interesting alternative to performing full quantum-state tomography [l^ . 

Least-squares fitting- The goal in least-squares fitting is to find a simple continuous function that well approximates 
a discrete set of N points {xi,yi}. The function is constrained to be linear in the fit parameters A S C*^, but it can be 
non-linear in x. For simplicity we consider a; G C, but the generalization to higher dimensional x is straight-forward. 
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Our fit function is tlien of the form 

M 

fix,X) :=^/,(x)A, 

where A^- is a component of A and f{x, A) : C*^+^ i— >■ C. The optimal fit parameters can be found by minimizing 

N 

E = J2\f{x,,X)-y,\' = \FX-y\' (1) 

i=l 

over all A, where we have defined the N x M matrix F through Fij = fj{xi), F* is its transpose, and y denotes the 
column vector (yi, . . . jUnY- Also, following HHL, we assume without loss of generality that < ||F'''F|| < 1 and 
< ||FF^|| < 1 Throughout this Letter we use || • || to denote the spectral norm. 

Given that F^F is invertible, the fit parameters that give the least square error are found by applying the Moore- 
Penrose pseudoinverse of F, F+, to y: 

A = F+y = (F^F)-IfV- (2) 

A proof that ([2]) gives an optimal A for a least-square fit is given in the appendix. 

The algorithm consists of three subroutines: a quantum algorithm for performing the pseudo-inverse, an algorithm 
for estimating the fit quality and an algorithm for learning the fit-parameters A. 



1. Fitting Algorithm — Our algorithm uses a quantum computer and oracles that output quantum states that 
encode the matrix elements of F to approximately prepare F+y. The matrix multiplications, and inversions, are 
implemented using an improved version of the HHL algorithm that utilizes recent developments in quantum 
simulation algorithms. 

Input: A quantum state |y) = X^pIaz+i ypb)/|y| that stores the data y, an upper bound (denoted k) for the square 
roots of the condition numbers of FF^ and F^F, the sparseness of F (denoted s) and an error tolerance e. 

Output: A quantum state |A) that is approximately proportional to the optimal fit parameters A/|A| up to error e as 
measured by the Euclidean-norm. 

Computational Model: We have a universal quantum computer equipped with oracles that, when queried about a 
non-zero matrix element in a given row, yield a quantum state that encodes a requested bit of a binary encoding the 
column number or value of a nonzero matrix element of F in a manner similar to those in [l^ . We also assume a 
quantum blackbox is provided that yields copies of the input state |y) on demand. 

Query Complexity: The number of oracle queries used is 

6(log(7V)(s3^6)/e), (3) 

where O notation implies an upper bound on the scaling of a function, suppressing all sub-polynomial functions. 
Alternatively, the simulation method of [isl . [l6j can be used to achieve a query complexity of 

d{\og{N){sK')/e'). (4) 

Analysis of Algorithm — The operators F and F^ are implemented using an isometry superoperator I to represent 
them as Hermitian operators on C^+*^. The isometry has the following action on a matrix X: 

These choices are convenient because I(F''')|y) contains Fty/|y| in its first M entries. We also assume for simplicity 
that |I(F'l')|y)| = 1. This can easily be relaxed by dividing I(Ft)|y) by |FV|- 

Preparing I(F''')|y) — The next step is to prepare the state I(F''')|y). This is not straightforward because I(F'I') is 
a Hermitian, rather than unitary, operator. We implement the Hermitian operator using the same phase estimation 
trick that HHL use to enact the inverse of a Hermitian operator, but instead of dividing by the eigenvalues of each 
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eigenstate we multiply each eigenstate by its eigenvalue. We describe the relevant steps below. For more details, 
see [13. 

The algorithm first prepares an ancilla state for a large integer T that is of order N 

l*o) = y|Esin(^^^)|r)«|y). (6) 

It then maps |5'o) to, 

/lEsinf^^^i^) |r)«e-(-^)-o/^|y), (7) 

for to G 0{n/e). We know from work on quantum simulation that exp(— iI(F^)rto/T) can be implemented within 
error C(e) in the 2-norm using 0{\og{N)s^tQ/T) quantum operations, if F has sparseness s Alternatively, the 
method of [H, gives query complexity 0(log(A^)srto/(eT)). If we write |y) = X^^i/^jIMj): where are the 
eigenvectors of I(F'I') with eigenvalue Ej we obtain 

a/|E sin {^-^^^) e--^-°/^|r) ^ /?>,), (8) 

r=0 ^ ^ 

The quantum Fourier transform is then applied to the first register and, after labeling the Fourier coefficients a^ij, 
the state becomes 

N T-1 

Y,Y.'^k\,P^k)\^i,). (9) 

HHL show that the Fourier coefficients are small unless the eigenvalue Ej Ki E^ '■— 27rfc/to, and € 0{n/e) is needed 
to ensure that the error from approximating the eigenvalue is at most e. It can be seen using the analysis in [loj that 
after re-labeling \k) as \Ek), and taking T £ 0{N), ^ is exponentially close to X!j=i f^j\^j)\t^j)- 

The final step is to introduce an ancilla system and perform a controlled unitary on it that rotates the ancilla state 

from |0) to y'l — C^Ej\0) + CEj\l), where C £ 0(maxj because the state would not be properly normahzed 

if C were larger. The probability of measuring the ancilla to be 1 is 0(1/k^) since CEj is at least 0{k^) 
repetitions are therefore needed to guarantee success with high probability, and amplitude amplification can be used 
to reduce the number of repetitions to 0{k) HHL show that either 0{1/k^) or 0{1/k) attempts are also needed 
to successfully perform I(F)~^ depending on whether amplitude amplification is used. 

The cost of implementing I(F'I') is the product of the cost of simulating I(rt) for time n/e and the number of 
repetitions required to obtain a successful result, which scales as 0{k). The improved simulation method of Childs 
and Kothari allows the simulation to be performed in time 0{\og{N)s'^K/e), where s is the sparseness of F; 
therefore, I(F''')|y) can be prepared using C)(log(-/V)s^K^/e) oracle calls. The cost of performing the inversion using 
the simulation method of [lllllal is found by substituting s s^^^/e into this or any of our subsequent results. 

Inverting F^F — We then finish the algorithm by applying (F^F)^^ using the method of HHL [l^]. Note that the 
existence of (F'''F)~^ is implied by a well-defined fitting-problem, in the sense that a zero eigenvalue of F^F would 
result in a degenerate direction of the quadratic form ([T]). The operator F'^'F € ([^mxm jg jjermitian and hence 
amenable to the linear systems algorithm. We do, however, need to extend the domain of the operator to make it 
compatible with |y) which is in a Hilbert space of dimension N + M. We introduce A to denote the corresponding 
operator. 

If we define |A) € C^"*"^^ to be a state of the form |A) — X^jli ^p to a normalizing constant, then F'l'FA is 
proportional to A|A) up to a normalizing constant. This means that we can find a vector that is proportional to the 
least-squares fit parameters by inversion via 



|A)=A-il(Ft)|y). 



(11) 
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This can be further simphfied by noting that 

A-i=I(F)-2. (12) 

Amphtude amphfication does not decrease the number of attempts needed to implement A"-'^ in (jlip because the 
algorithm require reflections about I(F'^)|y), which requires 0{k) repetitions to prepare. 

Since amplitude amplification provides no benefit for implementing A~^, O(k^) repetitions are needed to implement 
A~^I(F'I'). This is a consequence of the fact that the probability of successfully performing each I(F)~-'^ is 0(1/k^) 
and the probability of performing I(F^) is 0{1/k) (if amplitude amplification is used). The cost of performing the 
simulations involved in each attempt is 0{log{N)s'^K/e) and hence the required number of oracle calls scales as 

d{\og{N){s\ye)). (13) 

Although the algorithm yields |A) efficiently, it may be exponentially expensive to learn |A) via tomography; 
however, we show below that a quantum computer can assess the quality of the fit efficiently. 

2. Estimating Fit Quality — We will now show that we can efficiently estimate the fit quality E even if M is 
exponentially large and without having to determine the fit-parameters. For this problem, note that due to the 
isometry ^ E = ||y) — I(F)|A)p. We assume the prior computational model. We are also provided a desired error 
tolerance, e, and wish to determine the quality of the fit within error S. 

Input: A constant 5 > and all inputs required by algorithm 1. 

Output: An estimate of |(y|I(F)|A)p accurate within error 6. 

Query Complexity: 

a(^log(A)^). (14) 



Algorithm — We begin by preparing the state |y) (g) |y) using the provided state preparation blackbox. We then use 
the prior algorithm to construct the state 

I(F)A-il(Ft)|y) ® |y) = I(F)-il(Ft)|y) ® |y), (15) 

within error 0{e). The cost of implementing I(F)^^I(F^) (with high probability) within error e is 

6 (^log(A)^^ . (16) 

The swap test fisl] is then used to determine the accuracy of the fit. The swap test is a method that can be used 
to distinguish |y) and I(F)|A) by performing a swap operation on the two quantum states controlled by a qubit in 
the state (10) + |l))/\/2. The Hadamard operation is then applied to the control qubit and the control qubit is then 
measured in the computational basis. The test concludes that the states are different if the outcome is "1" . The 
probability of observing an outcome of "1" is (1 — |(y|I(F)|A)p)/2 for our problem. 

The overlap between the two quantum states can be learned by statistically sampling the outcomes from many 
instances of the swap test. The value of | (y|I(F) | A) p can be approximated using the sample mean of this distribution. 
It follows from estimates of the standard deviation of the mean that 0{l/6^) samples are required to estimate the 
mean within error 0{S). The cost of algorithm 2 is then found by multiplying by 1/(5^. 

The quantity E can be estimated from the output of algorithm 2 by E' < 2(1 — |(y|I(F)|A)|). Taylor series analysis 
shows that the error in the upper bound for E is also 0{6). 

There are several important limitations to this technique. First, if F is not sparse (meaning s e 0(poly(iV))) then 
the algorithm may not be efficient because the quantum simulation step used in the algorithm may not be efficient. 
As noted in previous results we can generalize our results to systems where F is non-sparse if there exists 

a set of efficient unitary transformations Uj such that 1(F) = J^j UjHjUj where each Hj is sparse and Hermitian. 
Also, in many important cases (such as fitting to experimental data) it may not be posible to prepare the initial state 
|y) efficiently. For this reason, our algorithm is better suited for approximating the output of quantum devices than 
the classical outputs of experiments. Finally, algorithm 2 only provides an efficient estimate of the fit quality and 
does not provide A; however, we can use it to determine whether a quantum state has a concise representation within 
a family of states. If algorithm 2 can be used to find such a representation, then the parameters |A) can be learned 
via state tomography. We discuss this approach below. 

3. Learning X- This method can also be used to find a concise fit function that approximates y. Specifically, we 
use statistical sampling and quantum state tomography to find a concise representation for the quantum state using 
M' parameters. The resulting algorithm is efficient if M' e C'(polylog(A^)). 
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Input: As algorithm 2, but in addition with an integer M' G C'(polylog(M)) that gives the maximum number of fit 
functions allowed in the fit. 

Output: A classical bit string approximating |A) to precision e, a list of the Af fit functions that comprise |A) and 
|(y|I(F)|A)p calculated to precision o. 

Computational Model: As algorithm 1, but the oracles can be controlled to either fit the state to all M fit functions 
or any subset consisting of M' fit functions. 

Query Complexity: 



Algorithm — The first step of the algorithm is to prepare the state |A) using algorithm 1. The state is then measured 
0{M') times and a histogram of the measurement outcomes is constructed. Since the probability of measuring each 
of these outcomes is proportional to their relevance to the fit, we are likely to find the M' of the most likely outcomes 
by sampling the state O(M') times. 

After choosing the M' most significant fit functions, we remove all other fit functions from the fit and prepare the 
state |A) using the reduced set of fit functions. Compressed sensing (T9l - [2l| is then used to reconstruct |A) within 
C(e) error. The idea of compressed sensing is that a low-rank density matrix can be uniquely determined (with high 
probability) by a small number of randomly chosen measurements. A convex optimization routine is then used to 
reconstruct the density matrix from the expectation values found for each of the measurements. 

Compressed sensing requires 0(Af ' log(M')^) measurement settings to reconstruct pure states, and observation 1 
of [l^ implies that 0{M' /e^) measurements are needed for each setting to ensure that the reconstruction error is 
0(e); therefore, ©(Af log(M')^/e^) measurements are needed to approximate the state within error ©(e). The total 
cost of learning |A) is the number of measurements needed for tomography multiplied by the cost of preparing the 
state and thus scales as 



which subsumes the cost of measuring |A) to find the most significant M' fit functions. 

Finally, we measure the quality of the fit using algorithm 2. The total cost of estimating |A) and the fit quality is 
thus the sum of (fTTj) and (fT6|) . as claimed. 

Remark: The quality of the resulting fit that is yielded by this algorithm depends strongly on the set of fit functions 
that are used. If the fit functions are chosen well, fewer than M' fit functions are used to estimate |y) with high 
fidelity. Conversely, 0{N) fit functions may be needed to achieve the desired error tolerance if the fit functions are 
chosen poorly. Fortunately, the efficiency of algorithm 2 allows the user to search many sets of possible fit functions 
for a concise and accurate model within a large set of potential models. 
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Here we review an elementary proof 23] of why applying Moore-Penrose pseudoinverse to the complex-valued 
vector y yields parameters that minimize the least-squares fit of the initial state. To begin, we need to prove some 
properties of the pseudoinverse. First, 





(17) 



Appendix A: Moore-Penrose Pseudoinverse 



(FF+)t = FF+. 



(Al) 



The proof of this property is 



(FF+)t = (F(Fl'F)-iFt)t = (F((FtF)-i)1'Ft). 



(A2) 



The result of (|Al[) then follows by noting that F'^'F is self-adjoint. 
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Next, we need the property that 

FF+F = F. (A3) 

This property follows directly from substituting in the definition of F+ into the expression. 
The final property we need is 

Ft(FF+y-y) = 0. (A4) 

Using property (jAip we find that 

Ft(FF+y-y) = (FF+F)V-FV. (A5) 

Property (|A3[) then implies that 

(ff+f)V-fV = fV-fV = o. (A6) 

For simplicity, we will express z = F^y and then find 

||FA-y||2 = l|Fz-y + (FA-Fz)|p. (A7) 

Expanding this relation yields, 

UFA -yf = ||Fz - y||2 + ||F(A - z)||2 + (Fz - y)tF(A - z) + (A - z)tFt(Fz - y). (A8) 

Property (jA4|) then implies that F^ (Fz — y) = and hence 

||FA-y|p = ||Fz-y||2 + ||F(A-z)f 

>||FF+y-y||, (A9) 

which holds with equality if A = z = F+y. Therefore, applying the Moore-Penrose peudoinverse to y provides fit 
parameters that minimize the least-square error. 
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